Endogeneity in Logistic Regression Models
نویسندگان
چکیده
Menteshashvili O, et al. Pulmonary tuberculosis in prisons of the ex-USSR state Georgia: results of a nationwide prevalence survey among sentenced inmates. Antimicrobial prescribing patterns for respiratory diseases including tuberculosis in Russia: a possible role in drug resistance? Jpyridinium chloride is useful for isolation of Mycobacterium tuberculosis from sputa subjected to long-term storage. establishing DOTS-plus pilot projects for the management of multidrug-resistant tuberculosis (MDR-TB) 2000. [cited 2005 Jan 18]. To the Editor: Ethelberg et al. (1) report on a study of the determinants of hemolytic uremic syndrome resulting from Shiga toxin–producing Escherichia coli. The dataset is relatively small, and the authors use step-wise logistic regression models to detect small differences. This indicates that the authors were aware of the limitations of the statistical power of the study. Despite this, the study has an analytic flaw that seriously reduces the statistical power of the study. An often overlooked problem in building statistical models is that of endogeneity, a term arising from econometric analysis, in which the value of one independent variable is dependent on the value of other pre-dictor variables. Because of this endo-geneity, significant correlation can exist between the unobserved factors contributing to both the endogenous independent variable and the dependent variable, which results in biased estimators (incorrect regression coefficients) (2). Additionally, the correlation between the dependent variables can create significant multicollineari-ty, which violates the assumptions of standard regression models and results in inefficient estimators. This problem is shown by model-generated coefficient standard errors that are larger than true standard errors, which biases the interpretation towards the null hypothesis and increases the likelihood of a type II error. As a result, the power of the test of significance for an independent variable X 1 is reduced by a factor of (1-r 2 (1|2,3,….)), where r (1|2,3,….) is defined as the multiple correlation coefficient for the model X 1 = f(X 2 ,X 3 ,…), and all X i are independent variables in the larger model (3,4). The results of this study clearly show that the presence of bloody diarrhea is an endogenous variable in the model showing predictors of hemolyt-ic uremic syndrome, in that the diarrhea is shown to be predicted by, and therefore strongly correlated with, several other variables used to predict hemolytic uremic syndrome. Similarly, Shiga toxin 1 and 2 (stx1, stx2) genes are expected to be key predictors of the presence of bloody diarrhea, independent of strain, …
منابع مشابه
Comparison of ordinary logistic regression and robust logistic regression models in modeling of pre-diabetes risk factors
Background: Regarding the increased risk of developing type 2 diabetes in pre-diabetic people, identifying pre-diabetes and determining of its risk factors seems so necessary. In this study, it is aimed to compare ordinary logistic regression and robust logistic regression models in modeling pre-diabetes risk factors. Methods: This is a cross-sectional study and conducted on 6460 people, over ...
متن کاملPrediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods : In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were ...
متن کاملEconometric Guidance for Developing Urbansim Models. First Lessons from the Sustaincity Project
In the context of the SustainCity project (http://www.sustaincity.eu), three European cities (Brussels, Paris and Zurich) will be modelled using the land use microsimulation platform UrbanSim. This platform relies on various models interacting with each other, to predict long-term urban development. The aim of this paper is to provide some econometric insight into this process. A common set of ...
متن کاملThe Comparison of Credit Risk between Artificial Neural Network and Logistic Regression Models in Tose-Taavon Bank in Guilan
One of the most important issues always facing banks and financial institutes is the issue of credit risk or the possibility of failure in the fulfillment of obligations by applicants who are receiving credit facilities. The considerable number of banks’ delayed loan payments all around the world shows the importance of this issue and the necessary consideration of this topic. Accordingly...
متن کاملComparison of logistic regression and neural network models in predicting the outcome of biopsy in breast cancer from MRI findings
Background: We designed an algorithmic model based on the logistic regression analysis and a non-algorithmic model based on the Artificial Neural Network (ANN). Materials and methods: The ability of these models was compared together in clinical application to differentiate malignant from benign breast tumors in a study group of 161 patients' records. Each patient’s record consisted of 6 subjec...
متن کاملFactors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis
Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...
متن کامل